Gaussian process style transfer mapping for historical Chinese character recognition
نویسندگان
چکیده
Historical Chinese character recognition is very important to larger scale historical document digitalization, but is a very challenging problem due to lack of labeled training samples. This paper proposes a novel nonlinear transfer learning method, namely Gaussian Process Style Transfer Mapping (GP-STM). The GP-STM extends traditional linear Style Transfer Mapping (STM) by using Gaussian process and kernel methods. With GP-STM, existing printed Chinese character samples are used to help the recognition of historical Chinese characters. To demonstrate this framework, we compare feature extraction methods, train a modified quadratic discriminant function (MQDF) classifier on printed Chinese character samples, and implement the GP-STM model on Dunhuang historical documents. Various kernels and parameters are explored, and the impact of the number of training samples is evaluated. Experimental results show that accuracy increases by nearly 15 percentage points (from 42.8% to 57.5%) using GP-STM, with an improvement of more than 8 percentage points (from 49.2% to 57.5%) compared to the STM approach.
منابع مشابه
Effect of Pixel’s Spatial Characteristics on Recognition of Isolated Pixelized Chinese Character
The influence of pixel's spatial characteristics on recognition of isolated Chinese character was investigated using simulated prosthestic vision. The accuracy of Chinese character recognition with 4 kinds of pixel number (6*6, 8*8, 10*10, and 12*12 pixel array) and 3 kinds of pixel shape (Square, Dot and Gaussian) and different pixel spacing were tested through head-mounted display (HMD). A ca...
متن کاملProblems and Approaches for Oriental Document Analysis
Machine understanding of hand,filled documents in China, Japan and Korea requires not only general solutions of document analysis but also ability to handle peculiarities of the Oriental languages. As expected, handwritten Chinese character recognition is the major task for it. In addition, Japanese Kana, Korean Hangul, Roman alphabet as well as numerals are targets of recognition. The main dif...
متن کاملUnicode Chinese paleography : making the evolutionary leap from bone , bronze , silk , and paper , to electronic bits Dr . Richard
As more and more rare characters are encoded, Unicode provides better and better support for Chinese. In conjunction with CDL technology, texts of many historical periods can now be digitized with unparalleled accuracy. For even greater accuracy, a move beyond the encoding of modern-style CJK characters is required, and specialists from all over the world have begun to express interest in worki...
متن کاملTitle Handwritten Chinese character recognition using spatial Gabor filters and self-organizing feature maps
So far the bottleneck of Chinese character recognition, especially handwritten character recognition, still lies in the effectiveness of featureextraction to cater for various distortions and position shifting. In ths paper, a novel method is proposed by applying a set of Gabor spatial filters with different directions and spatial frequencies to character images, in an effort to reach the optim...
متن کاملHandwritten Chinese Character Recognition Using Spatial Gabor Filters and Self-Organizing Feature Maps
So far the bottleneck of Chinese character recognition, especially handwritten character recognition, still lies in the effectiveness of featureextraction to cater for various distortions and position shifting. In ths paper, a novel method is proposed by applying a set of Gabor spatial filters with different directions and spatial frequencies to character images, in an effort to reach the optim...
متن کامل